[FLINK-37443] Add returns() method to DataStream V2 API for specifying output types with lambda expressions#26284
Conversation
…nd using .returns method
|
@flinkbot run azure |
|
@reswqa can you please review it? |
|
cc @codenohup |
| </dependencies> | ||
| <dependency> | ||
| <groupId>org.apache.flink</groupId> | ||
| <artifactId>flink-core</artifactId> |
There was a problem hiding this comment.
The API modules should not depend on non-API modules, because this would require users to depend on the flink-core module when developing DataStream V2 programs, but flink-core should only be used at application runnning.
You might consider moving TypeInformation to the API module, such as core-api. However, I understand that this is challenging due to its dependencies on numerous other classes.
There was a problem hiding this comment.
I can copy it, but will it be okay to have these duplicate code in codebase? Or is there any other better way to do that?
There was a problem hiding this comment.
The dependency of flink-core module through usingTypeInformation is only used in the interfaces located in flink-datastream-api, as the implementations are defined in flink-datastream module which are already using flink-core module.
I think it does not make much sense to copy a large number of files and maintain it just for a small feature and that too only for defining interfaces. I am open to other ideas.
There was a problem hiding this comment.
The dependency of
flink-coremodule through usingTypeInformationis only used in the interfaces located inflink-datastream-api, as the implementations are defined inflink-datastreammodule which are already usingflink-coremodule.I think it does not make much sense to copy a large number of files and maintain it just for a small feature and that too only for defining interfaces. I am open to other ideas.
Sorry for the delayed reply. I think we can introduce a class, such as TypeHint, at the API level that allows users to define a class with generics. Subsequently, this TypeHint can be converted into TypeInformation at the implementation level.
During this process, you can leverage TypeDescriptor to describe common types within the flink-datastream-api. Additionally, it would be advantageous to incorporate the tuple type into TypeDescriptor.
| * @param typeInfo type information as a return type hint | ||
| * @return This operator with a given return type hint. | ||
| */ | ||
| NonKeyedPartitionStream<T> returns(TypeInformation<T> typeInfo); |
There was a problem hiding this comment.
KeyedPartitionStream and GlobalStream also need returns method.
There was a problem hiding this comment.
okay I will add it,
There was a problem hiding this comment.
okay I will add it,
BroadcastStream also need returns method
There was a problem hiding this comment.
Can it be just part of datastream interface? Then it will extend all other streams by default.
There was a problem hiding this comment.
Can it be just part of datastream interface? Then it will extend all other streams by default.
Thank you for the reminder. You can add the withReturnType method to the ProcessConfigurable interface, as it proves to be useful in most scenarios. Regarding situations with two output streams, users can obtain both streams individually and then apply the withReturnType method to each. Here's an example to illustrate:
NonKeyedPartitionStream.ProcessConfigurableAndTwoNonKeyedPartitionStream<Integer, String>
twoOutputStream = ...;
twoOutputStream.getFirst().withReturnType(TypeDescriptor.INT).process(...);
twoOutputStream.getSecond().withReturnType(TypeDescriptor.String).process(...);
| * DataStream API. It is currently in the experimental stage and is not fully available for | ||
| * production. | ||
| */ | ||
| public class WordCountUsingLambda { |
There was a problem hiding this comment.
Maybe you should rewrite the previous example to use lambda expressions instead of rewriting an almost identical example.
There was a problem hiding this comment.
yes, I was thinking the same !
codenohup
left a comment
There was a problem hiding this comment.
Hi, @nilmadhab
Thank you for your contribution! I've left some comments, PTAL.
…rocessConfigurableAndKeyedPartitionStreamImpl
|
Any other examples I can modify or test case to add? @codenohup |
|
@flinkbot run azure |
It is best to modify the original example/test case. If it does not meet your needs, you can also add a new example/test case. |
What is the purpose of the change
(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)
Brief change log
(for example:)
Verifying this change
Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation